Effective Arabic Stemmer Based Hybrid Approach for Arabic Text Categorization
نویسندگان
چکیده
منابع مشابه
Arabic Text Categorization
In this paper, we compare the performance of three classifiers for Arabic text categorization. In particular, the naïve Bayes, k-nearest-neighbors (knn), and distance-based classifiers were used. Unclassified documents were preprocessed by removing punctuation marks and stopwords. Each document is then represented as a vector of words (or of words and their frequencies as in the case of the naï...
متن کاملCBAS: context based arabic stemmer
Arabic morphology encapsulates many valuable features such as word’s root. Arabic roots are being utilized for many tasks; the process of extracting a word’s root is referred to as stemming. Stemming is an essential part of most Natural Language Processing tasks, especially for derivative languages such as Arabic. However, stemming is faced with the problem of ambiguity, where two or more roots...
متن کاملArabic Light Stemmer: Anew Enhanced Approach
In general, word stemming is one of the most important factors that affect the performance of information retrieval systems. The optimization issues of Arabic light stemming algorithm as a main component in natural language processing and information retrieval for Arabic language are based on root-pattern schemes. Since Arabic language is a highly inflected language and has a complex morphologi...
متن کاملUnsupervised Stemmer for Arabic Tweets
Stemming is an essential processing step in a wide range of high level text processing applications such as information extraction, machine translation and sentiment analysis. It is used to reduce words to their stems. Many stemming algorithms have been developed for Modern Standard Arabic (MSA). Although Arabic tweets and MSA are closely related and share many characteristics, there are substa...
متن کاملWord sense disambiguation for arabic text categorization
In this paper, we present two contributions for Arabic Word Sense Disambiguation. In the first one, we propose to use both two external resources AWN and WN based on Term to Term Machine Translation System (MTS). The second contribution relates to the disambiguation strategies, it consists of choosing the nearest concept for the ambiguous terms, based on more relationships with different concep...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Data Mining & Knowledge Management Process
سال: 2013
ISSN: 2231-007X,2230-9608
DOI: 10.5121/ijdkp.2013.3401